ReRankingを適用したRAGの精度向上について

Reona

2024.02.05

この記事は公開されてから1年以上経過しています。情報が古い可能性がありますので、ご注意ください。

はじめに

はじめましてクラスメソッド株式会社新規事業部のレオナです。

クラスメソッド株式会社では、社内情報の検索と回答の精度向上のために、RAG(Retrieval-Augmented Generation)を用いたQAチャットボットを運用と検証しています。このシステムは、ユーザーからの質問に対して関連する社内文書を検索し、LLMがそれらの情報を基に回答を生成します。しかし、運用の中で、ユーザーが常に必要とする情報を得られないという問題があります。

この問題に対処するため、Re-ranking(再ランク)という手法を用いて問題の解決を試みます。Re-rankingは、Retrieverによって取得された複数の文書を、クエリに対する関連度を別ベクトルを用いて高い順別に並べ替える手法です。これによって検索の精度があがり、チャットボットの回答精度が上がることが期待できます。

実装

今回、参考・引用したサンプルコードになります。

https://github.com/openai/openai-cookbook/blob/main/examples/Search_reranking_with_cross-encoders.ipynb
https://github.com/aws-samples/amazon-bedrock-rag-workshop/blob/dcdb2f64f796c53a2e226c57447711843e901bca/05_Semantic_Search_with_Reranking/02_LlamaIndex_Reranker_Bedrock_Titan.ipynb

使用するLLMはOpenAIのText-Embedding-Ada-002とAWS BedrockのAmazon.Titan-Embed-Text-v1を使用しています。LLMモデルを使用するにあたって、OpenAIのAPIとAWS Bedrockのモデルの有効化が必要になります。AWS Bedrockの有効化については詳しくはこちらをご覧ください。

Amazon Bedrock をマネジメントコンソールからちょっと触ってみたいときは Base Models（基盤モデル）へのアクセスを設定しましょう

検証では、まず具体的なクエリを定義し、それに関連する文書を用意する必要があります。今回はOpenAIのサンプルコードで書かれている、arxiv(アーカイブ)という査読前論文投稿サイトが提供しているarxiv APIを用いてクエリ検索を行い、論文のAbstractを文書として扱います。

論文情報の取得

1.arxivのライブラリをインポートし、クエリは以下のように定義します。

# pipインストールでarxiv APIが使えるようになります。
pip install arxiv

import arxiv
query = "how do bi-encoders work for sentence embeddings"
client_arxiv = arxiv.Client()
search = arxiv.Search(
query=query, max_results=20, sort_by=arxiv.SortCriterion.Relevance
)

2.クエリに対する検索結果です。比較する対象として、論文のタイトルのみ表示させます。

1: A Comparative Study of Sentence Embedding Models for Assessing Semantic Variation
2: SBERT studies Meaning Representations: Decomposing Sentence Embeddings into Explainable Semantic Features
3: Are Classes Clusters?
4: Semantic Composition in Visually Grounded Language Models
5: Evaluating the Construct Validity of Text Embeddings with Application to Survey Questions
6: Learning Probabilistic Sentence Representations from Paraphrases
7: Exploiting Twitter as Source of Large Corpora of Weakly Similar Pairs for Semantic Sentence Embeddings
8: How to Probe Sentence Embeddings in Low-Resource Languages: On Structural Design Choices for Probing Task Evaluation
9: Clustering and Network Analysis for the Embedding Spaces of Sentences and Sub-Sentences
10: Vec2Sent: Probing Sentence Embeddings with Natural Language Generation
11: Non-Linguistic Supervision for Contrastive Learning of Sentence Embeddings
12: SentPWNet: A Unified Sentence Pair Weighting Network for Task-specific Sentence Embedding
13: Learning Joint Representations of Videos and Sentences with Web Image Search
14: Character-based Neural Networks for Sentence Pair Modeling
15: Train Once, Test Anywhere: Zero-Shot Learning for Text Classification
16: Efficient Domain Adaptation of Sentence Embeddings Using Adapters
17: Hierarchical GPT with Congruent Transformers for Multi-Sentence Language Models
18: Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models
19: In Search for Linear Relations in Sentence Embedding Spaces
20: Learning to Borrow -- Relation Representation for Without-Mention Entity-Pairs for Knowledge Graph Completion

ソート結果

各LLMモデルがクエリとAbstractに対する関連度の高い順番にソートした結果が以下の通りになりました。各LLMによる再ランク付けされ順序が入れ替わりました。

arxiv　オリジナル	AWS Bedrock Amazon.Titan-Embed-Text-v1	OpenAI Text-Embedding-Ada-002
A Comparative Study of Sentence Embedding Models for Assessing Semantic Variation	A Comparative Study of Sentence Embedding Models for Assessing Semantic Variation	Vec2Sent: Probing Sentence Embeddings with Natural Language Generation
SBERT studies Meaning Representations: Decomposing Sentence Embeddings into Explainable Semantic Features	Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models	Are Classes Clusters?
Are Classes Clusters?	In Search for Linear Relations in Sentence Embedding Spaces	Semantic Composition in Visually Grounded Language Models
Semantic Composition in Visually Grounded Language Models	Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models	Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models
Evaluating the Construct Validity of Text Embeddings with Application to Survey Questions	Are Classes Clusters?	How to Probe Sentence Embeddings in Low-Resource Languages: On Structural Design Choices for Probing Task Evaluation
Learning Probabilistic Sentence Representations from Paraphrases	Are Classes Clusters?	SBERT studies Meaning Representations: Decomposing Sentence Embeddings into Explainable Semantic Features
Exploiting Twitter as Source of Large Corpora of Weakly Similar Pairs for Semantic Sentence Embeddings	SBERT studies Meaning Representations: Decomposing Sentence Embeddings into Explainable Semantic Features	Exploiting Twitter as Source of Large Corpora of Weakly Similar Pairs for Semantic Sentence Embeddings
How to Probe Sentence Embeddings in Low-Resource Languages: On Structural Design Choices for Probing Task Evaluation	SBERT studies Meaning Representations: Decomposing Sentence Embeddings into Explainable Semantic Features	Train Once, Test Anywhere: Zero-Shot Learning for Text Classification
Clustering and Network Analysis for the Embedding Spaces of Sentences and Sub-Sentences	Exploiting Twitter as Source of Large Corpora of Weakly Similar Pairs for Semantic Sentence Embeddings	Clustering and Network Analysis for the Embedding Spaces of Sentences and Sub-Sentences
Vec2Sent: Probing Sentence Embeddings with Natural Language Generation	Vec2Sent: Probing Sentence Embeddings with Natural Language Generation	A Comparative Study of Sentence Embedding Models for Assessing Semantic Variation
Non-Linguistic Supervision for Contrastive Learning of Sentence Embeddings	Exploiting Twitter as Source of Large Corpora of Weakly Similar Pairs for Semantic Sentence Embeddings	Efficient Domain Adaptation of Sentence Embeddings Using Adapters
SentPWNet: A Unified Sentence Pair Weighting Network for Task-specific Sentence Embedding	Semantic Composition in Visually Grounded Language Models	Learning Probabilistic Sentence Representations from Paraphrases
Learning Joint Representations of Videos and Sentences with Web Image Search	Semantic Composition in Visually Grounded Language Models	Learning to Borrow -- Relation Representation for Without-Mention Entity-Pairs for Knowledge Graph Completion
Character-based Neural Networks for Sentence Pair Modeling	Efficient Domain Adaptation of Sentence Embeddings Using Adapters	Non-Linguistic Supervision for Contrastive Learning of Sentence Embeddings
Train Once, Test Anywhere: Zero-Shot Learning for Text Classification	Efficient Domain Adaptation of Sentence Embeddings Using Adapters	In Search for Linear Relations in Sentence Embedding Spaces
Efficient Domain Adaptation of Sentence Embeddings Using Adapters	Clustering and Network Analysis for the Embedding Spaces of Sentences and Sub-Sentences	Character-based Neural Networks for Sentence Pair Modeling
Hierarchical GPT with Congruent Transformers for Multi-Sentence Language Models	Learning Probabilistic Sentence Representations from Paraphrases	SentPWNet: A Unified Sentence Pair Weighting Network for Task-specific Sentence Embedding
Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models	Hierarchical GPT with Congruent Transformers for Multi-Sentence Language Models	Evaluating the Construct Validity of Text Embeddings with Application to Survey Questions
In Search for Linear Relations in Sentence Embedding Spaces	Learning Joint Representations of Videos and Sentences with Web Image Search	Hierarchical GPT with Congruent Transformers for Multi-Sentence Language Models
Learning to Borrow -- Relation Representation for Without-Mention Entity-Pairs for Knowledge Graph Completion	How to Probe Sentence Embeddings in Low-Resource Languages: On Structural Design Choices for Probing Task Evaluation	Learning Joint Representations of Videos and Sentences with Web Image Search

考察

Amazon.Titan-Embed-Text-v1を使用する際、同じタイトルの文書が取得されました。この現象は、「chunking」と呼ばれる細分化して文書内でさらにどこが類似しているのかを検索できます。

llama_indexを使用し、chunkingとchunk_overlapを調整することで、検索する文章が細分化されます。

# BedrockとBedrockEmbeddingをllama_indexからインポートします
from llama_index.llms import Bedrock
from llama_index.embeddings import BedrockEmbedding

# Titanモデルのパラメータを設定します
model_kwargs_titan = {
"stopSequences": [],
"temperature":0.0,
"topP":0.5
}

# Bedrockのインスタンスを作成します
llm = Bedrock(
model="amazon.titan-text-express-v1", # amazon.titan-tg1-largeから変更
context_size=512,
aws_region_name=region,
additional_kwargs=model_kwargs_titan
)

# BedrockEmbeddingのインスタンスを作成します
embed_model = BedrockEmbedding().from_credentials(
aws_profile=None,
model_name='amazon.titan-embed-text-v1' # amazon.titan-embed-g1-text-02から変更
)

# チャンクのオーバーラップを設定します
chunk_overlap = 20
# チャンクのサイズを設定します
chunk_size = 512
# サービスコンテキストを設定します
service_context = ServiceContext.from_defaults(llm=llm,
embed_model=embed_model,
chunk_size=chunk_size,
chunk_overlap=chunk_overlap,)
# グローバルサービスコンテキストを設定します
set_global_service_context(service_context)

比較としてchunkingのサイズを拡大してみました。

Chunking=512	Chunking=2048
A Comparative Study of Sentence Embedding Models for Assessing Semantic Variation	A Comparative Study of Sentence Embedding Models for Assessing Semantic Variation
Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models	In Search for Linear Relations in Sentence Embedding Spaces
In Search for Linear Relations in Sentence Embedding Spaces	Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models
Sentence-T5: Scalable Sentence Encoders from Pre-trained Text-to-Text Models	Are Classes Clusters?
Are Classes Clusters?	Exploiting Twitter as Source of Large Corpora of Weakly Similar Pairs for Semantic Sentence Embeddings
Are Classes Clusters?	SBERT studies Meaning Representations: Decomposing Sentence Embeddings into Explainable Semantic Features
SBERT studies Meaning Representations: Decomposing Sentence Embeddings into Explainable Semantic Features	Vec2Sent: Probing Sentence Embeddings with Natural Language Generation
SBERT studies Meaning Representations: Decomposing Sentence Embeddings into Explainable Semantic Features	Semantic Composition in Visually Grounded Language Models
Exploiting Twitter as Source of Large Corpora of Weakly Similar Pairs for Semantic Sentence Embeddings	Efficient Domain Adaptation of Sentence Embeddings Using Adapters
Vec2Sent: Probing Sentence Embeddings with Natural Language Generation	Clustering and Network Analysis for the Embedding Spaces of Sentences and Sub-Sentences
Exploiting Twitter as Source of Large Corpora of Weakly Similar Pairs for Semantic Sentence Embeddings	Learning Probabilistic Sentence Representations from Paraphrases
Semantic Composition in Visually Grounded Language Models	Learning Joint Representations of Videos and Sentences with Web Image Search
Semantic Composition in Visually Grounded Language Models	How to Probe Sentence Embeddings in Low-Resource Languages: On Structural Design Choices for Probing Task Evaluation
Efficient Domain Adaptation of Sentence Embeddings Using Adapters	Non-Linguistic Supervision for Contrastive Learning of Sentence Embeddings
Efficient Domain Adaptation of Sentence Embeddings Using Adapters	Hierarchical GPT with Congruent Transformers for Multi-Sentence Language Models
Clustering and Network Analysis for the Embedding Spaces of Sentences and Sub-Sentences	SentPWNet: A Unified Sentence Pair Weighting Network for Task-specific Sentence Embedding
Learning Probabilistic Sentence Representations from Paraphrases	Character-based Neural Networks for Sentence Pair Modeling
Hierarchical GPT with Congruent Transformers for Multi-Sentence Language Models	Train Once, Test Anywhere: Zero-Shot Learning for Text Classification
Learning Joint Representations of Videos and Sentences with Web Image Search	Evaluating the Construct Validity of Text Embeddings with Application to Survey Questions
How to Probe Sentence Embeddings in Low-Resource Languages: On Structural Design Choices for Probing Task Evaluation	Learning to Borrow -- Relation Representation for Without-Mention Entity-Pairs for Knowledge Graph Completion

Chunkingのサイズを512から2048に変更したところ参照される文章が増え、詳細に検索されません。一部ランキングが変わりましたが、chunkingサイズを小さくすることでクエリに対して類似性の高い文章が検索できます。

OpenAIのText-Embedding-Ada-002は”Vec2Sent: Probing Sentence Embeddings with Natural Language Generation”が一番類似性がありました。AWSのBedrock Amazon.Titan-Embed-Text-v1は10番目にランク付されていました。

まとめ

検証の結果、異なるLLMを使用してクエリに対する文書の関連度を再ランク付けすることで、検索結果の順位が変わることが確認できました。チャットボットを運用する上で別のモデルに切り替えることで目的に応じた再ランクキングが可能で、検索の精度が上がることが期待できます。

今後に向けて

Re-rankingを用いて検索結果の順位が変わることが確認できましたが、本当に使えるものか検証する必要があります。以下が残課題として挙げられます。

定量的に分析ができていないため評価指数を設定して、それを用いて分析する。
Reranking前と後でユーザーが必要な情報を得られたか、定性的に分析する。

ReRankingを適用したRAGの精度向上について

はじめに

実装

論文情報の取得

ソート結果

考察

まとめ

今後に向けて

関連記事

主なカテゴリ

AWSで探す

注目のテーマ

プロダクトやサービスで探す

特集やシリーズから探す

お問い合わせ

運営会社